Using substitution probabilities to improve position-specific scoring matrices
نویسندگان
چکیده
Each column of amino acids in a multiple alignment of protein sequences can be represented as a vector of 20 amino acid counts. For alignment and searching applications, the count vector is an imperfect representation of a position, because the observed sequences are an incomplete sample of the full set of related sequences. One general solution to this problem is to model unobserved sequences by adding artificial 'pseudo-counts' to the observed counts. We introduce a simple method for computing pseudo-counts that combines the diversity observed in each alignment position with amino acid substitution probabilities. In extensive empirical tests, this position-based method out-performed other pseudo-count methods and was a substantial improvement over the traditional average score method used for constructing profiles.
منابع مشابه
Scoring Amino Acid Substitutions In Φhage Genomes
Substitution matrices are among the most widely used scoring techniques : BLAST, Muscle and other alignment packages, all use them. However these matrices are general; they ignore organism specific properties and do not provide customized scoring schemes. We present a Φhage-specific scoring matrix based on the abundances of aligned substitutions. These matrices use information from approximatel...
متن کاملFold-specific substitution matrices for protein classification
MOTIVATION Methods that focus on secondary structures, such as Position Specific Scoring Matrices and Hidden Markov Models, have proved useful for assigning proteins to families. However, for assigning proteins to an attribute class within a family these methods may introduce more free parameters than are needed. There are fewer members and there is less variability among sequences within a fam...
متن کاملBayesian Markov models consistently outperform PWMs at predicting motifs in nucleotide sequences
Position weight matrices (PWMs) are the standard model for DNA and RNA regulatory motifs. In PWMs nucleotide probabilities are independent of nucleotides at other positions. Models that account for dependencies need many parameters and are prone to overfitting. We have developed a Bayesian approach for motif discovery using Markov models in which conditional probabilities of order k - 1 act as ...
متن کاملOn the significance of sequence alignments when using multiple scoring matrices
MOTIVATION Pairwise local sequence alignment is commonly used to search data bases for sequences related to some query sequence. Alignments are obtained using a scoring matrix that takes into account the different frequencies of occurrence of the various types of amino acid substitutions. Software like BLAST provides the user with a set of scoring matrices available to choose from, and in the l...
متن کاملPattern of Amino Acid Substitutions in Transmembrane Domains of β-Barrel Membrane Proteins for Detecting Remote Homologs in Bacteria and Mitochondria
β-barrel membrane proteins play an important role in controlling the exchange and transport of ions and organic molecules across bacterial and mitochondrial outer membranes. They are also major regulators of apoptosis and are important determinants of bacterial virulence. In contrast to β-helical membrane proteins, their evolutionary pattern of residue substitutions has not been quantified, and...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computer applications in the biosciences : CABIOS
دوره 12 2 شماره
صفحات -
تاریخ انتشار 1996